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SUMMARY 


The  use  of  the  speech  signal  for  assessing  physiological  and  psychological 
changes  resulting  from  ’strain’  in  pilots  and  air  traffic  controllers  is  explained 
and  a  device  is  described  for  tracking  one  of  the  parameters  of  the  speech  signal, 
the  fundamental frequency,  to  quantify  changes  in  this  parameter  due  tc  ’strain’. 
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1  INTRODUCTION 

The  assessment  of  the  effects  upon  man  of  varying  mental  workloads  is  an 
extremely  complex  problem  particularly  in  view  of  the  many  interactions  involved 
between  task  demands,  environmental  factors  and  the  individual  characteristics  of 
man  himself. 

A  flow  diagram  indicating  the  major  interactions  occurring  is  shown  in 
Fig  1  and  is  based  upon  a  model  of  the  working  system,  in  particular  of  air 
traffic  controllers,  due  to  Laurig  and  described  by  Rohmert  .  For  present  pur¬ 
poses  the  term  'stress'  is  used  to  describe  the  input  to  a  'man  at  work',  of  task 
demands  and  other  environmental  factors  which  affect  his  performance  such  as 
noise,  vibration,  illumination  and  climate,  and  also  other  demands  imposed  upon 
the  man  such  as  the  responsibilities  of  the  job.  'Stress'  on  interaction  with  a 
man  and  his  individual  characteristics,  such  as  his  level  of  competence,  results 
in  'strain'.  It  is  this  'strain'  imposed  upon  a  man  that  may  result  in  changes 
in  certain  physiological  parameters  (Fig  1). 

This  Report  is  concerned  primarily  with  the  effects  of  mental  stress  rather 
than  physical  stress  and  where  the  term  'workload'  is  used  it  refers  to  mental 
workload. 

Such  physiological  parameters  as  heart  rate,  skin  potential,  electromyo¬ 
grams,  electroencephalograms  and  the  concentration  of  catecholamines  in  the  urine 
have  been  used  by  many  researchers  in  attempts  to  provide  information  about  the 
workload  of  man,  although  the  measure  of  heart  rate  is  perhaps  the  most  widely 
used  and  easiest  parameter  to  record  continuously  .  However,  when  using  these 
parameters,  particularly  heart  rate,  it  is  still  difficult  to  distinguish  between 
physical  and  mental  workload.  In  the  case  of  heart  rate,  other  parameters  such 
as  oxygen  intake,  temperature  and  catecholamine  metabolism  must  be  measured  when 
attempting  to  distinguish  between  sources  of  heart  rate  changes.  A  further 
disadvantage  of  using  these  physiological  parameters  to  assess  workload  is  that 
measurement  of  them  requires  some  degree  of  interference  with  the  subject.  In 
some  situations  it  may  not  be  very  practicable  to  instrument  the  subject  to 
obtain  all  the  measurements  required  and  even  if  it  were  there  still  remains  a 
possibility  that  the  subject's  motivation  may  be  affected.  Acoustic  analysis  of 
the  speech  signal  of  a  subject  at  work  in  an  attempt  to  identify  changes  in  the 
signal  due  to  'strain'  is  attractive  since  it  does  not  require  direct  instrument¬ 
ation  of  the  subject.  In  the  case  of  pilots  and  air  traffic  controllers  a 
communication  channel  already  exists  and  recordings  of  their  speech  may  be  made 
for  subsequent  analysis  without  interfering  in  any  way  with  their  work;  indeed 


the  subjects  themselves  need  not  be  aware  that  anything  out  of  the  ordinary  is 
taking  place,  thus  avoiding  any  modification  to  their  motivation  due  to  awareness 
of  measurements  being  made  on  their  performance. 

The  human  vocal  system  is  a  complex  one  in  which  the  prime  function  of  the 

organs  making  up  the  system  is  not  that  of  producing  speech  at  all  but  one  of 
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survival  as  suggested  by  O'Connor  .  The  lungs  transfer  oxygen  to  the  blood,  the 
vocal  cords  help  to  prevent  foreign  bodies  entering  the  trachea  whilst  the  tongue 
plays  a  vital  role  in  chewing  and  swallowing.  It  is  for  this  reason  that  organs 
when  fulfilling  their  secondary  function  of  producing  sneech  may  also  contribute 
information  to  the  speech  signal  which  has  no  linguistic  significance  but  is 
rather  a  reflection  of  the  _ articular  physiological  state  of  the  organs.  This 
contribution  may  provide  useful  information  on  the  'strain'  imposed  on  a  subject 
as  it  is  reflected  in  the  physiological  state  of  such  organs  as  the  respiratory 
system  and  the  larynx.  Such  physiological  changes  which  may  occur  in  a  subject 
under  'storin'  are  'uncontrolled'  in  that  they  are  not  effected  by  direct  motor 
control.  In  addition  to  uncontrolled  physiological  changes  resulting  in  a  modifi¬ 
cation  to  the  speech  signal,  'controlled'  changes  (re  under  motor  control)  may 
take  place  in  the  speech  signal.  In  the  case  of  air  traffic  controllers  under 
high  'stress'  a  message  communicated  may  be  modified  due  to  the  requirements  of 
the  situation.  For  example  in  a  situation  of  high  air  traffic  density  it  may  be 
necessary  for  an  air  traffic  controller  to  communicate  specific  information  to 
several  pilots  in  a  very  short  space  of  time  resulting  in  a  fast  speaking  rate 
but  accompanied  by  a  reduction  in  the  number  of  words  used  to  communicate  the 
information. 


At  the  acoustic  level  the  speech  signal  may  be  described  by  a  number  of 
features,  referred  to  as  prosodic  features,  such  as  the  fundamental  frequency  (or 
pitch  of  the  voice),  formant  frequencies,  intensity  and  duration  (for  a  more 
detailed  description  of  the  speech  signal,  see  section  2). 

These  prosodic  features  of  speech  may  convey  linguistic  and  extra-linguistic 
information.  However  since  the  same  features  carry  both  types  of  information  it 
is  necessary  to  separate  those  changes  in  the  prosodic  features  conveying  linguis¬ 
tic  information  from  those  conveying  extra-linguistic  information. 


Some  physiological  changes  which  may  occur  in  a  subject  under  'strain'  are 
(ration  rate  and  increased  muscle  tension,  both  of  which  could  give 
rise  to  an  overall  increase  in  the  voice  piti.ii;  these  and  other  physiological 
changes  may  also  be  manifest  in  some  measure  in  other  prosodic  features  (Fig  2). 
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In  analysing  the  speech  signal  to  attempt  to  provide  indices  of  ’strain' 
arising  from  the  effects  of  workload  and  environment  we  are  essentially  looking 
for  those  prosodic  features  of  the  signal  which  are  correlates  of  the  subjects 
physiological  and  psychological  state.  Having  fottnd  reliable  and  good  correlates 

.if 

there  still  remains  the  problem  of  deciding  to  which  extra  linguistic  features 
the  changes  in  prosodies  should  be  ascribed.  Since  various  emotions  are  reflected 
in  the  extra  linguistic  information  this  involves,  for  example,  the  relationships 
which  may  or  may  not  exist  between  emotional  state  and  strain  due  to  workload. 

A  number  of  researchers  have  studied  what  they  generally  refer  to  as  'the 
effects  of  emotions  on  the  voice'.  Some  of  these  studies  have  been  concerned 
purely  with  measuring  the  differences  between  different  types  of  emotion"* 
whilst  others  have  sought  to  relate  vocal  changes  to  the  'emotional  state  of 
pilots  during  flight'  ,  'pilot  stress'  ,  'state  of  attention'  ,  'the  emotional 
state  of  man  under  conditions  of  space  flight'1^,  'task-induced  stress'1**  and 
simply  'stress'  ,  'emotional  stimuli'  and  'emotional  state’  .  In  seeking  to 
find  indicators  of  workload  level  or  task  difficulty  in  the  speech  signal,  caution 
must  be  exercised  in  correlating  the  changes  in  certain  prosodic  features  reported 
by  researchers  due  to  various  emotions  or  emotional  stimuli  with  changes  due  to 
task  or  workload  level.  Whilst  the  prosodic  features  which  reflect  changes  in 
the  emotional  state  of  a  person  may  well  be  the  same  features  which  will  best  give 
information  on  the  subjects  state  due  to  workload  level  or  task  difficulty,  it  is 
not  at  all  clear  that  emotional  state  bears  a  one  to  one  correspondence  with  work¬ 
load  state.  Certain  emotional  states  such  as  fear  or  anxiety  may  well  sometimes 
be  a  manifestation  of  strain  resulting  from  workload  imposed  upon  a  subject. 
However  other  less  obvious  'emotions’  may  well  have  nothing  to  do  with  strain 
arising  from  the  workload  or  task  imposed  but  may  be  a  reflection  of  the  under¬ 
lying  mood  of  the  subject. 

Of  all  the  prosodic  features,  the  voice  fundamental  frequency  appears  to  be 

•  .  .  II  12 

potentially  one  of  the  best  carriers  of  extra-linguistic  information  ’  and  for 

this  reason  the  'pitch  tracker'  described  in  this  Report  has  been  developed  to 

enable  a  detailed  statistical  analysis  of  the  voice  fundamental  frequency  of 

pilots  and  air  traffic  controllers  to  be  carried  out. 

Early  techniques  for  measuring  the  fundamental  frequency  consisted  of 

measuring  the  distance  between  the  periodic  epochs  in  the  speech  signal  by  hand 

thus  giving  a  period  by  period  measurement  or  making  average  measurements  over 

20-23 

consecutive  time  intervals  .  Whilst  such  methods  render  accuracies  of  around 
0.5%  they  are  obviously  unsuitable  for  large  quantities  of  speech  and  of  course 
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do  not  permit  automatic  tracking  of  the  fundamental  frequency  in  real  time. 
Another  method  which  has  been  used  it  that  of  low  pass  filtering  the  speech  and 
subjecting  the  result  to  some  form  of  frequency  measurement.  Using  fixed  cut  off 
frequency  filters  only  a  very  limited  range  of  fundamental  frequency  can  be 
measured  and  a  high  signal-to-noise  ratio  signal  is  required.  Variable  cut  off 
frequency  filters  have  been  tried  in  the  past  but  in  the  early  days  suffered  from 

•y  / 

switching  and  control  transients  . 

The  advent  of  the  computer  has  made  possible  the  implementation  of  a  number 
of  mathematical  methods  for  analysis  of  the  speech  signal  and  the  extraction  of 
such  parameters  as  the  fundamental  frequency.  Some  of  these  methods  will  be 
discussed  in  more  detail  in  a  later  section.  Until  fairly  recently  these 
algorithms  could  not  be  implemented  in  real  time  on  small  minicomputers,  however 
the  use  of  microprocessors  and  specialised  digital  hardware  systems  has  now 
enabled  real  time  calculations  to  be  realised.  Such  hardware  can  be  complicated 
and  expensive  to  construct  and  the  need  for  a  fairly  simple  system  to  track  the 
fundamental  frequency  of  speech  often  recorded  with  high  background  noise  has  led 
to  the  development  of  the  circuitry  described  in  this  Report.  The  method  used 
for  extraction  of  the  voice  fundamental  frequency  is  based  on  an  algorithm 
developed  by  the  Joint  Speech  Research  Unit,  Cheltenham  (JSRU)  for  a  commercial 
application,  but  differs  in  the  voiced/unvoiced  discussion  strategy  employed  to 
achieve  successful  tracking  of  the  fundamental  frequency  in  high  levels  of  back¬ 
ground  noise.  The  circuitry  used  to  implement  the  algorithm  is  believed  to  be 
novel. 

2  SPEECH  CHARACTERISTICS 

2. 1  The  nature  of  speech  and  speech  production  is  a  vast  and  complex  subject 

and  so  only  a  brief  overview  will  be  given  here,  the  reader  is  referred  to  Fant^ 
26  4 

Flanagan  and  O'Connor  for  a  more  detailed  discussion  of  the  subject. 

Speech  may  be  described  as  being  made  up  of  words  each  containing  one  or 
more  sounds  which  are  called  phonemes.  A  particular  phoneme  string  characterises 
^  particular  word  and  in  English  some  40  different  sounds  or  phonemes  have  been 
identified  which  carry  basic  linguistic  information.  At  the  phonetic  level, 
speech  may  be  studied  in  two  ways,  the  first  being  a  study  of  articulatory  phone¬ 
tics,  showing  new  sounds  are  produced  and  classifying  them  according  to  their 
method  of  production.  This  may  then  lead  onto  the  way  sounds  are  put  together 
to  convey  linguistic  information.  The  second  way  of  studying  speech  is  at  the 
acoustic  level  where  the  acoustic  signal  between  the  mouth  and  ear  is  investigated 
and  sounds  are  described  in  terms  of  frequency  spectra,  intensity  and  duration. 
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The  varying  acoustic  features  of  the  speech  signal  which  describe  particular 
sounds  and  the  context  and  way  in  which  they  are  spoken  are  referred  to  as 
prosodic  features  and  include  voice  pitch,  intensity,  formant  frequencies  and 
duration.  These  features  may  best  be  described  by  first  considering  the  way  in 
which  speech  sounds  are  produced.  There  are  two  basic  components  of  the  speech 
system  which  interact  in  the  production  of  sounds,  these  are  the  vocal  cords  and 
the  vocal  tract  which  is  itself  composed  of  the  oral  cavity,  pharynx  and  nasal 
cavity  (Fig  3). 

2. 2  Vocal  cords 

The  passage  of  air  over  the  vocal  co-;  is  provides  a  source  of  energy  which 
excites  the  vocal  tract  producing  sounds  which  may  be  classified  into  two  types 
according  to  whether  the  vocal  cords  are  vibrating  or  not  and  are  known  as  voiced 
and  unvoiced  sounds.  There  are  further  sub-divisions  of  sound  types  which  are 
determined  by  the  particular  articulatory  mechanisms  employed. 

(a)  Voiced  sounds 

These  are  produced  by  the  passage  of  air  over  the  vocal  cords  (Fig  4)  which 
vibrate  in  a  quasi  periodic  fashion.  The  pulses  produced  as  the  vocal  cords  open 
have  a  spectrum  rich  in  harmonics  (Fig  5) ,  the  first  harmonic  is  referred  to  as 
the  voice  fundamental  frequency  and  the  period  between  the  pulses  as  the  excit¬ 
ation  period*.  In  voiced  sounds  the  mode  of  vibration  of  the  vocal  cords  deter¬ 
mines  whether  the  voice  is  r  rraal,  creaky  or  breathy.  Each  is  distinguished  by 
the  amount  of  air  passing  over  the  vocal  cords  in  the  open  phase.  The  loudness 
of  the  voice  is  determined  by  how  wide  the  vocal  cords  open,  together  with  the 
amount  of  lung  pressure  exerted. 

(b)  Unvoiced  sounds 

Such  sounds  are  produced  when  the  vocal  cords  are  not  vibrating  and  an 
unimpeded  breath  of  air  is  produced,  eg  as  in  the  sound  'sh',  'ff'.  The  cords 
may  be  partially  closed  together  producing  some  turbulence  of  air  such  as  occurs 
in  the  sound  'h'  .  Acoustically  these  sounds  are  characterised  by  a  broadband 
noise  spectrum  with  most  energy  in  the  higher  frequencies. 

The  voice  fundamental  frequency  carries  intonational  information  in  the 
speech  signal  and  has  a  mean  value  of  around  125  Hz  in  adult  male  speakers. 


£ 


vD 

id 

o 


*  When  no  specific  method  of  measurement  is  implied  the  term  voice  pitch  will  be 
used  for  convenience. 
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2, 3  Vocal  tract 

The  vocal  tract  which  is  composed  of  the  phai..,.ix,  nasal  and  oral  cavities, 
is  excited  by  the  pulse  train  from  the  vocal  cords  and  produces  a  radiated  spect¬ 
rum  at  the  lips  (Fig  7)  which  is  characterised  by  the  particular  transfer  function 
of  the  vocal  tract  at  any  instant  (Fig  6) .  The  peaks  in  the  radiated  spectrum 
are  known  as  formant  frequencies  Fj,  etc  and  arise  as  a  result  of  reson¬ 

ances  in  the  vocal  tract  cavities.  It  is  these  formant  frequencies  which  carry 
the  linguistic  information  in  the  speech  signal,  the  positions  of  the  formant 
frequencies  varying  with  the  articulation  of  the  vocal  tract  and  tongue.  Thus 
each  voiced  sound  in  the  English  language  is  characterised  by  a  particular  formant 
frequency  pattern  which  may  be  seen  on  a  wideband  spectrogram  which  displays  the 
energy,  typically  in  200Hz  frequency  bands,  over  a  predetermined  frequency  range 
(for  example  0-4  kHz)  versus  time  (Fig  8) . 

The  pharynx  is  tube  shaped  (Fig  3)  and  may  be  varied  in  length  by  the 
raising  of  the  larynx  or  the  soft  palate.  When  the  soft  palate  is  raised  the 
nasal  cavity  is  occluded  and  is  therefore  not  excited  by  the  glottal  pulses.  Of 
the  three  cavities  it  is  the  oral  cavity  which  plays  the  greatest  roll  in  forming 
the  sounds  since  it  is  capable  of  considerable  change  in  shape  and  size  due  to 
the  lips,  tongue  and  lower  jaw  although  it  is  the  tongue  which  contributes  most 
in  speech  production  as  the  prime  articulator.  As  the  phonemes  are  put  together 
to  form  words  the  acoustic  pattern  resulting  is  determined  by  the  articulatory 
requirements  to  produce  a  given  word  and  a  word  may  consist  of  voiced  and  unvoiced 
sounds  and  perhaps  gaps  of  silence  within  a  word,  known  as  'stop  gaps'.  The 
phonemes  of  the  English  language  may  be  divided  into  groups  other  than  voiced  or 
unvoiced  according  to  the  manner  of  articulation  producing  them,  however  it  is 
beyond  the  scope  of  this  Report  ot  enter  into  a  discussion  of  this  subject. 

3  EXTRA-LINGUISTIC  CHANGES  IN  THE  PROSODIC  FEATURES  OF  SPEECH 

3. 1  The  existence  of  extra-linguistic  changes  in  particular  acoustic  features 
of  speech  has  already  bt en  discussed  in  section  1.  Some  of  these  changes  will 
now  be  discussed  in  more  detail  with  particular  reference  to  investigations 
carried  out  by  other  workers  into  these  changes. 

3.1.1  Fundamental  frequency 

Of  all  the  features  of  the  speech  signal,  fundamental  frequency  appears  to 
be  the  most  widely  studied  for  extra-linguistic  changes  and  several  researchers 
have  found  good  correlation  between  changes  in  the  voice  fundamental  frequency 
and  the  subjects  emotional  state^'**. 
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It  has  been  reported  that  changes  in  the  respiratory  pattern  frequently 
occur  in  situations  of  fear  or  anxiety  and  Williams  suggests  that  "an  increase 
in  respiration  rate  would  presumably  result  in  an  increased  subglottal  pressure 
during  speech.  This  heightened  subglottal  pressure  would  give  rise  to  a  higher 
fundamental  frequency  during  voiced  sounds  j.n  speech".  A  further  result  of 
increased  respiration  rate  could  be  "shorter  durations  of  speech  between  breaths". 
The  tension  of  the  larynx  muscles  would  also  be  expected  to  have  an  effect  on  the 
fundamental  frequency,  an  increase  in  muscle  tension  caused  by  certain  psycholo¬ 
gical  states  would  give  rise  to  an  overall  increase  in  the  fundamental  frequency 

and  changes  in  the  glottal  pulse  shape  may  also  result  *.  The  work  of  Williams 
11  12 

and  Stevens  *  shows  overall  changes  in  the  fundamental  frequency  between 
different  types  of  emotion  measured  for  speech  samples  of  several  seconds  dura¬ 
tion,  the  significant  variables  between  types  of  emotion  were  found  to  be  mean 
fundamental  frequency  and  the  fundamental  frequency  range.  Hecker  et  al)**  also 
report  changes  in  mean  fundamental  frequency,  range  and  contour  between  a  relaxed 
subject  and  the  same  subject  under  'task  induced  stress'.  In  some  subjects 

studied  the  mean  fundamental  frequency  was  found  to  increase  under  stress  whilst 

.  1 3  \ 

m  others  a  decrease  was  noted.  Isao  Kuroda  et  at  use  a  measure  of  the  excit¬ 
ation  period.  By  calculating  a  term  called  the  'vibration  space  shift  rate' 
which  is  a  measure  of  the  change  of  the  highest  excitation  period  taken  from  a 
phrase  spoken  by  a  pilot  in  a  'normal'  situation  to  one  spoken  in  an  'urgent' 
or  emergency  situation  the  degree  of  'stress'  was  assessed  from  a  nine  point 
classification  of  ’shift  rate’. 

From  the  investigations  carried  out  by  these  researchers  it  would  appear 
that  changes  in  fundamental  frequency  over  and  above  intonational  changes  do 
occur  under  certain  emotions  and  task  or  workload  induced  'strain'.  One  of  the 
problems  associated  with  assessing  changes  in  the  fundamental  frequency  due  to 
extra-linguistic  factors  is  that  of  separating  out  changes  due  to  the  linguistic 
or  intonational  information  being  conveyed.  Several  workers  have  shown  that 
samples  of  voiced  speed  of  20-30  seconds  duration  must  be  used  in  order  to  obtain 

estimates  of  the  mean  fundamental  frequency  which  are  independent  of  the  speech 

„  28,29 

content 


3.1.2  Formant  frequencies 


Simonov  et  at  report  the  ability  to  distinguish  between  differing  emotional 
states  (joy,  delight,  anxiety  and  fear)  using  a  measure  of  fundamental  frequency 
and  the  average  number  of  zero  crossings  occurring  in  the  first  formant  region  of 
the  speech  signal  (an  estimate  of  the  first  formant  frequency  Fj).  The  method 
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described,  plots,  for  a  particular  Russian  vowel,  the  ratio  of  the  zero  crossing 
frequency  in  a  state  of  rest  to  the  frequency  in  an  emotional  state  versus  the 
ratio  of  the  fundamental  frequency  in  a  state  of  rest  to  the  frequency  in  the 
emotional  state.  Average  values  for  the  ratios  are  determined  from  three  sound 
impressions.  Using  a  linear  determinant  function  92-94%  discrimination  between 
emotional  states  and  state  of  rest  was  obtained  for  all  the  cases  examined. 

It  is  possible  that  changes  in  the  formant  frequencies  may  result  from 
increased  muscle  tension  causing  a  variation  in  the  vocal  tract  shape  above  that 
which  is  caused  by  articulation. 

3.1.3  Amplitude 

Hecker  et  al}^  report  changes  in  the  mean  amplitude  of  two  phonetically 
stressed  vowels  selected  from  a  test  phrase  spoken  under  'control’  and  'task 
induced  stress'  situations.  Only  four  out  of  ten  subjects  however  showed 
'definite  and  consistent  trends'  in  the  amplitude  changes  and  for  one  of  the  four 
subjects  the  amplitude  change  under  'stress'  was  the  reverse  of  that  for  the  other 
three.  Friedhoff  et  at  also  report  changes  in  the  average  amplitude  of  a 
spoken  word  when  a  subject  was  presented  with  emotional  stimuli.  In  this  case  a 
'baseline'  amplitude  was  established  with  neutral  stimuli  and  masking  noise 
presented  to  the  ears.  Some  subjects  in  this  experiment  showed  a  decrease  in 
amplitude  when  presented  with  emotional  stimuli  whereas  others  showed  an  increase 
in  amplitude  over  the  baseline  level.  It  could  perhaps  be  concluded  from  these 
experiments  that  the  direction  of  change  in  amplitude  is  not  as  important  as  the 
fact  that  a  change  has  taken  place.  Presumably  the  changes  in  amplitude  could  be 
related  in  some  way  to  respiratory  changes  in  the  subjects,  however  :in  view  of 
the  difficulty  in  establishing  constant  'baseline'  amplitude  levels  this  technique 
appears  to  be  very  difficult  to  implement  as  a  method  for  assessing  changes  under 
emotion  or  'strain'. 

3.1.4  Duration  of  sounds 

If  the  respiration  rate  of  a  subject  increases  under  strain  or  some  emotion 
it  could  be  expected  that  the  duration  of  sounds  or  words  between  breaths  would 
shorten,  there  does  not  however  appear  to  have  been  any  work  done  to  assess  this 
change  in  the  speech  pattern. 

4  SURVEY  OF  PRESENT  METHODS  OF  'PITCH'  MEASUREMENT 

4. 1  The  assessment  of  the  pitch  of  voiced  speech  may  be  attempted  by  one  of 
three  methods. 
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(a)  Making  measurements  from  the  temporal  properties  of  the  signal. 

(b)  Making  measurements  from  the  frequency  domain  properties  of  the 
signal. 

(c)  Utilising  both  the  time  and  frequency  domain  properties  of  the  signal- 

Examples  of  time  domain  pitch  detectors  are  those  using  zero  crossing 
measurements,  peak  and  valley  measurement  and  autocorrelation  measurements.  Such 
methods  require  the  preprocessing  of  the  signal  to  minimise  the  effects  of  formant 
structure  on  the  waveform.  Frequency  domain  pitch  detectors  are  based  on  the 
principle  that  if  the  signal  is  periodic  then  the  frequency  spectrum  of  the 
signal  will  contain  a  fundamental  frequency  plus  harmonics  of  that  frequency. 

The  term  'fundamental  frequency'  of  the  voice  really  refers  to  the  frequency  of 
the  first  harmonic  of  the  vocal  source  frequency  spectrum  (fe  implies  a  measure¬ 
ment  made  in  the  frequency  domain) . 

An  example  of  a  method  operating  in  the  frequency  domain  is  cepstrum 
analysis  which  will  be  briefly  described  later.  Hybrid  methods  of  pitch  assess¬ 
ment.  involve  measurements  in  both  the  frequency  and  time  domain,  an  example  of 
this  method  is  the  simplified  inverse  filter  technique  (SIFT)  of  Markel  1972  in 
which  the  speech  signal  is  inverse  filtered  to  produce  a  spectrally  flattened 
signal  which  is  then  autocorrelated. 

Early  methods  of  determining  the  voice  pitch  involved  peak  detection  or  the 
use  of  fixed  or  variable  low  pass  filters,  each  generally  employing  some  font  of 
non-linear  preprocessing  of  the  speech  waveform  to  enhance  the  fundamental 
frequency  component.  Peak  detection  methods  work  on  the  basis  of  detecting  the 
epochs  in  the  speech  waveform  corresponding  to  the  vocal  cord  pulse  train  produced 
in  voiced  sounds.  The  distance  between  these  epochs  or  'pulses'  is  taken  as  the 
excitation  or  pitch  period;  however  it  should  be  noted  that  the  voiced  excitation 
of  the  vocal  tract  is  only  quasi-periodic.  Further  difficulties  in  using  the 
peak  detection  method  of  period  estimation  arise  from  the  fact  that  the  glottal 
waveform  varies  in  shape  and  amplitude  as  well  as  period  and  this  can  make  it 
difficult  to  decide  which  epochs  on  the  speech  waveform  should  be  chosen  for 
period  estimation. 

The  use  of  fixed  low  pass  filters  to  extract  the  first  harmonic  of  the 
glottal  waveform  is  limited  to  high  quality  speech  signals  where  the  fundamental 
frequency  range  is  small.  Variable  low  pass  filters  can  be  used  to  overcome  the 
problem  of  fundamental  frequency  range  but  early  attempts  proved  unsatisfactory 
due  to  switching  or  control  transients  .  With  thase  problems  overcome,  this 
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technique  is  still  only  suitable  for  high  quality  signals  in  which  the  fundamental 
frequency  is  present.  Non-linear  preprocessing  methods  which  have  been  used  to 

enhance  the  fundamental  include  half  wave  and  full  wave  rectification  and 

.  24 

squaring 

Since  the  early  1960's  much  work  has  been  done  on  the  use  of  algorithms 
implemented  on  computers  and  special  hardware  to  analyse  the  speech  waveform. 

And  there  is  considerable  current  activity  in  this  area. 

Perhaps  one  of  the  best  known  methods  for  speech  analysis  is  cepstrum 

processing.  This  technique  used  for  the  extraction  of  'pitch'  is  described  by 
3 1  32 

Noll  ’  .  The  term  cepstrum  refers  to  the  spectrum  or  Fourier  transform  of  the 

log-amplitude  spectrum  and  since  the  reciprocal  of  frequency  is  the  resultant 
variable  the  term  ’quefrequency1  was  coined  by  the  inventors  to  describe  it.  The 
log-taking  operation  has  the  property  of  separating  the  vocal  source  characteris¬ 
tics  (ie  harmonic  structure)  from  the  vocal  system  characteristics  (ie  envelope 
of  spectrum).  Fig  9.  As  a  result  of  the  separation  which  occurs,  the  cepstrum 
method  is  suitable  for  the  analysis  of  formant  frequencies  as  well  as  larynx 
excitation  frequency.  In  order  to  extract  automatically  the  pitch  from  the 
resulting  cepstrum  of  consecutive  samples  of  the  speech  waveform  (typically  20  ms) 
further  algorithms  must  be  applied  to  detect,  with  appropriate  thresholds  and 
boundaries,  the  peak  in  the  cepstrum  representing  the  quefrequency  or  period.  For 
a  more  thorough  and  detailed  description  of  the  method  the  reader  is  referred  to 
Refs  31  and  32. 


Another  approach  to  the  analysis  of  the  speech  waveform  is  the  use  of 

inverse  filtering  in  which  the  excited  vocal  tract  is  modelled  by  a  time  varying 

linear  filter  whose  parameters  mav  be  determined  bv  linear  prediction 
.  .  33,34  ,  /  . 

analysis  ,  Numerous  other  algorithms  have  been  proposed  for  pitch  extraction, 
see  Refs  35-44 
ted  by  Rabiner  et  at 


An  interesting  como= 

45 


"f  some  of  these  algorithms  is  presen- 


One  of  the  problems  associated  with  these  algorithms  is  that  their  implemen¬ 
tation  on  a  general  purpose  minicomputer  does  not  allow  analysis  in  real  time. 
Special  purpose  hardware  can  and  has  been  used  to  implement  some  of  these 
algorithms  allowing  analysis  in  real  time,  however  such  hardware  can  be  both  com¬ 
plex  and  expensive.  The  analysis  of  oandwidth  limited  and/or  noisy  signals,  also 
poses  a  restriction  on  the  type  of  algorithm  which  can  be  used  for  analysis,  such 
methods  as  the  cepstrum  analysis  for  example  being  suited  to  bandwidth  limited 
signals  such  as  would  be  obtained  from  telephone  speech.  This  method  can  also  be 
used  with  some  success  on  noisy  signal.  A  digital  hardware  cepstrum  processor 
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used  at  the  JSRU  has  proved  capable  of  tracking  the  pitch  of  a  pilot's  voice  in 
fairly  high  background  noise  levels. 

The  pitch  extractor  to  be  described  here  employs  a  simple  frequency  domain 
algorithm  similar  to  that  developed  at  the  JSRU  and  allows  implementation  in  real 

V 

time  using  fairly  inexpensive  analogue  hardware.  This  particular  algorithm  has 
also  proved  to  work  well  with  signals  containing  high  background  noise  levels. 

4.2  Description  of  the  algorithm 

The  basis  of  the  algorithm  is  the  extraction  of  the  envelope  of  a  bandpass 
filtered  speech  waveform  processed  through  a  full  wave  rectifier. 

If  a  speech  signal  is  passed  through  a  bandpass  filter  whose  pass  band  is 
300-600  Hz  the  resulting  signal  will  contain  some  harmonics  of  the  fundamental 
frequency  and  may  in  the  case  of  a  child  or  female  contain  the  fundamental  itself. 
The  harmonics  present  in  the  pass  band  will  beat  together  and  the  resulting 
envelope  of  the  signal  will  correspond  to  the  fundamental  frequency.  In  order  to 
extract  this  'envelope'  frequency  the  bandpass  filtered  signal  is  full  wave 
rectified,  then  high  pass  filtered  at  90  Hz  to  remove  the  dc  component  and  passed 
through  a  variable  low  pass  filter.  The  cut  of  f  frequency  of  the  low  pass  filter 
must  be  such  that  only  the  fundamental  frequency  component  or  envelope  frequency 
is  passed.  This  is  achieved  by  employing  a  'feedback'  system  in  which  the  output 
of  the  variable  (tracking)  low  pass  filter  is  fed  to  a  phase  locked  loop  (PLL) 
whose  loop  error  voltage  is  then  used  to  control  the  tracking  filter.  Fig  10. 

Once  the  PLL  has  acquired  lock  the  cut  off  frequency  of  the  tracking  filter  is 
moved  by  the  control  voltage  to  a  value  which  admits  only  the  fundamental  frequency 
component.  The  'idle'  frequency  of  the  voltage  controlled  oscillator  of  the  PLL 
(ie  frequency  when  no  input  signal  present)  must  be  such  that  the  control  voltage 
to  the  tracking  filter  produces  a  cut  off  frequency  in  the  filter  sufficiently 
high  to  allow  the  PLL  to  lock  onto  high  voice  pitches  but  not  so  high  as  to  admit 
too  high  a  proportion  of  the  harmonics  of  low  fundamentals  causing  'pitch  doubling' 
effects.  A  voiced/unvoiced  decision  is  made  simply  on  the  basis  of  detecting  the 
envelope  of  the  full  wave  rectified  fundamental  frequency  obtained  from  the  pitch 
extractor,  Fig  11.  A  variable  threshold  level  at  which  the  voiced/unvoiced 
decision  is  made  is  used  to  avoid  decisions  being  made  on  spurious  low  amplitude 
signals  derived  from  background  noise.  This  method  was  found  to  be  superior  to 
the  use  of  energy  comparisons  in  high  and  low  frequency  bands  for  voiced/unvoiced 
detection  when  the  speech  material  contained  high  background  noise  with  a  broad 
spectrum  extending  up  to  several  kilohertz  before  falling  off  in  intensity. 
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5  CIRCUIT  DESCRIPTION 

5. 1  Pitch  tracker 

5.1.1  Bandpass  filter 

A  commercial  (KEMO)  four  pole  variable  Butterworth  bandpass  filter  is  used 
in  the  prototype  pitch  tracker  having  an  attenuates  in  the  stop  bands  of  48  dB 
per  octave.  The  bandpass  is  set  to  300-600  Hz  and  input  signal  levels  required 
are  of  the  order  of  1-2  volts  peak-to-pesk. 

5. 1 . 2  Full  wave  rectifier 

The  output  from  the  bandpass  filter  is  fed  to  a  full  wave  precision 
rectifier  via  a  lOx  amplifier.  Precise  rectification  is  achieved  by  means  of  an 
operational  amplifier  and  diode  configuration  (Fig  12a)  which  operates  as  follows. 

For  positive  inputs  the  first  operational  amplifier  acts  as  an  inverting 
amplifier,  diode  conducts  and  is  cut  off.  The  second  operational  amplifier 
acts  as  a  unity  gain  inverter  and  the  output  is  positive 


For  amplifier  Aj : 


V  *  +  V. 

o  m 


Negative  inputs  cause  to  conduct  and  cut  off  Dj. 


V. 

xn 


V 

o 

3R 


Since 


4] 


r  iR 

o  3R 


Thus  for  negative  inputs  the  output  is  positive.  The  envelope  of  the  full 
wave  rectifier  output  is  the  fundamental  frequency. 

5.1.3  90Hz  high  pass  filter 

The  output  from  the  full  wave  rectifier  is  fed  to  a  90Hz  high  pass  filter 
with  a  third  order  Butterworth  response  which  removes  the  dc  component  of  the 
signal  and  any  low  frequency  syllabic  modulations. 
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5.1.4  Tracking  low  pass  filter 

The  Cracking  low  pass  filter  is  designed  around  the  so-called  'two  integra¬ 
tor  loop*  (or  analogue  computing  loop)  using  operational  transconductance  ampli¬ 
fiers  (RCA  CA3080)  to  provide  a  voltage  controllable  current  source  to  the 
integrators  of  the  loop.  A  'fourth  order'  filter  system  is  used  in  the  pitch 
tracker  but  since  this  consists  of  two  identical  two  integrator  loops  cascaded 
together,  only  one  loop  will  be  described  in  detail. 


The  transfer  operator  of  a  filter  say  be  regarded  as  a  symbolic  represent¬ 
ation  of  a  linear  differential  equation^  which  can  be  solved  using  standard 
analogue  computing  techniques.  A  second  order  low  pass  filter  voltage  transfer 
function  has  the  form  (Ref  46,  p  69). 
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complex  variable  (frequency  domain) 
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scaling  factor 

cut  off  frequency. 


If  the  complex  variable  is  replaced  by  the  differential  operator  d/dt  and 
the  equation  rearranged,  a  differential  equation  results  which  can  be  solved  by 
using  two  integrators  and  an  adder  (Fig  1 2b). 
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The  operation  of  the  'two  integrator  loop’  may  be  described  as  follows,  referring 
to  Fig  12b. 
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which  is  the  equation  we  have  in  (2) 
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It  can  also  be  shown  that  Vq  is  the  low  pass  response  by  arranging  (3)  in  the 
form  of  the  general  low  pass  transfer  function  (1). 
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The  Q  of  the  filter  may  be  adjusted  by  R2  where  (R2)/(Rj)  *  2Q  -  1 .  A  ’true' 
fourth  order  low  pass  filter  should  be  obtained  by  using  the  fourth  order  transfer 
function  with  a  differential  operator  and  solving  the  equation  with  the  appropriate 
analogue  computing  elements  in  the  manner  described  for  the  second  order  response. 

Using  fixed  values  of  c  it  can  readily  be  seen  that  varying  R  (Fig  12b) 
allows  tuning  of  the  cut  off  frequency  (us^)  of  the  filter.  In  order  to  allow 
the  value  of  R  to  be  controlled  electrically,  operational  trans conductance 
amplifiers  (OTA)  have  been  utilised,  providing  voltage  controllable  resistors 
to  the  integrators  of  the  two  integrator  loop  (Fig  13).  Only  a  brief  description 
of  the  OTA  will  be  given  here  to  provide  sufficient  understanding  of  its  operation 
in  the  two  integrator  loop,  for  a  more  detailed  description  the  reader  is  referred 
to  Ref  47. 


The  output  current  of  the  OTA  (I  £)  is  proportional  to  the  differential 
input  voltage  (e.^)  and  the  trans conductance  (g^)  of  the  amplifier  determined  by 
the  amplifier  bias  current  (I  )  and  the  differential  input  voltage 
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This  equation  is  analogous  to  the  Ohms  law  equation 
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From  these  equations  it  can  be  seen  that  variation  of  the  amplifier  bias 
current  I^jjq  provides  a  current  controllable  resistor  which  by  the  use  of  a 
resistive  input  to  the  amplifier  bias  input  may  be  Q3ds  voltage  controllable. 
The  control  voltage  is  derived  from  the  phase  locked  loop  error  voltage  with 
appropriate  scaling  (this  is  discussed  in  section  5.1.6)  and  may  be  related  to 
the  cut  off  frequency  in  the  following  manner.*  Referring  to  Fig  13 
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(By  Kirchof f s  Law) 
assuming  'virtual  earth’ 
at  the  input  of  the 
amplifier. 
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From  the  discussion  on  the  simple  two  integrator  loop  using  a  resistor  in 
the  integrator  we  found  that: 


o  J_ 
dt  u>„ 


where  “q  "  RC  *  the  cut  frequency  of  the  filter. 

In  the  expression  (4)  for  V02  where  the  resistor  is  replaced  by  an  OTA 
the  cut  off  frequency  of  the  filter  is  thus  defined  as: 


U°  "  ,9-2IABCa  ' 


For  the  particular  circuit  described  here: 


-V  -  Vc 


where  V  =  negative  supply  voltage  to  OTA. 

The  measured  low  pass  filter  response  versus  control  voltage  for  two  cascaded 
second  order  filters  is  shewn  in  Fig  14. 

5.1.5  Dc  decoupling 

Dc  decoupling  is  achieved  with  the  circuit  of  Fig  15a  which  is  an  ac  coupled 
non  inverting  follower. 

The  high  frequency  gain  is  unity  and  the  dc  gain  zero  with  the  low  frequency 
cut  off  being  determined  by  the  values  of  R  and 
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Values  of  27  kft  for  R  and  0.1  uF  fop  C  give  a  low  frequency  cut  off  of 
around  60  Hz. 

5.1.6  Phase  locked  loop 

The  phase  locked  loop  (FLL)  used  is  a  SIGNETICS  565  with  additional  cir¬ 
cuitry  added  to  extend  its  lock  range  (Ref  46,  p  239).  Details  of  the  operation 
of  the  565  PLL  are  given  in  Ref  46,  Chapter  7  and  Ref  48  but  will  not  be  given 
here.  To  enable  the  loop  to  acquire  lock  over  a  bandwidth  extending  from  60  Kz 
to  400  Hz  with  a  free  running  voltage  controlled  oscillator  (VCO)  frequency  of 
around  150  Hz  an  extended  lock  range  is  required  since  the  lock  range  of  the  565 
in  a  standard  configuration  is  only  around  ±60%  of  the  free  running  VCO  frequency. 

(a)  Extended  range 

Under  normal  circuit  configuration  (Fig  15b)  the  free  running  frequency  of 
the  VCO  is  determined  by  Rj  and  Cj 

1 . 2 

where  fg  ■  Hz  (Manufacturer’s  design  information). 

The  free  running  frequency  is  defined  as  the  oscillation  frequency  of  the  VCO 
without  input  signal  and  both  inputs  grounded.  Capture  range  is  the  range  of 
frequencies  either  side  of  the  fg  over  which  the  PLL  will  acquire  lock  with  an 
input  signal  initially  starting  out  of  lock.  The  lock  range  or  tracking  range  is 
defined  as  the  range  of  frequencies  either  side  of  fg  that  the  VCO  will  retrain 
locked  to  once  locked  on  to  an  input  signal. 

To  extend  the  lock  range  of  the  PLL  to  greater  than  ±60%  of  fg  the 
circuitry  described  in  Ref  46,  pp  238-239  is  used  (Fig  16)  in  which  increased 
loop  gain  is  achieved  by  supplying  all  the  VCO  charging  current  into  pin  8  using 
an  external  current  source  (two  BCY  71 fs)  controlled  by  the  loop  error  voltage. 
With  this  circuit  configuration  the  frequency  of  the  VCO  now  becomes  a  function 
of  the  charging  current  and  Cl  (Fig  16)  where: 

£  - 
VCO  2VC, 
s  1 


20 


and 


\ 
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The  lock  range  can  be  controlled  by  the  resistor  which  sets  the  gain 

of  the  741  operational  amplifier  and  the  centre  frequency  may  be  adjusted  by 
altering  the  voltage  to  the  non  inverting  terminal  of  the  operational  amplifier. 

(b)  Loop  error  voltage  and  filter  control  voltage 

The  free  running  VCO  frequency  is  set  at  150  Hz  using  a  timing  capacitor 
Cj  of  0.47  PF  and  adjusting  the  voltage  to  the  non  inverting  terminal  of  the 
operational  amplifier  with  a  50kft  potentiometer  connected  as  a  potential  divider 
between  +15  volts  and  ground  (Fig  ’&).  At  this  frequency  the  output  voltage  Vq 
from  the  operational  amplifier  is  10  volts,  this  voltage  is  used  to  establish  a 
cut  off  frequency  for  the  tracking  low  pass  filter  of  150  Hz  with  no  input  signal 
to  the  filter  or  PLL.  A  graph  of  amplifier  output  voltage  (loop  error  voltage) 
versus  the  VCO  frequency  is  shown  in  Fig  17.  The  cut  off  frequency  of  the  tracking 
filter  when  no  input  signal  is  present  is  crucial  to  determining  what  range  of 
voice  fundamental  frequencies  may  be  tracked  by  the  filter.  Selection  of  too 
high  a  cut  off  frequency  may  result  in  the  PLL  locking  onto  the  second  harmonic 
of  the  glottal  source  spectrum  from  low  pitched  voices  whilst  too  low  a  cut  off 
frequency  would  prevent  the  PLL  from  locking  onto  the  fundamental  frequency  of 
high  pitched  voices  such  as  females  or  children’s  voices.  With  a  bandpassed 
speech  signal  input  to  the  full  wave  rectifier  (as  described  in  earlier  sections) 
the  tracking  low  pass  filter  produces  the  envelope  frequency  at  its  output.  With 
this  signal  input  to  the  PLL,  the  loop  error  voltage  changes  causing  the  VCO  to 
oscillate  at  the  same  frequency  as  the  input  signal  (providing  the  input  signal 
is  within  the  capture  range  of  the  PLL).  The  loop  error  voltage  is  also  used  to 
control  the  cut  off  frequency  of  the  tracking  filter  so  that  as  the  PLL  locks  onto 
the  output  from  the  filter  its  cut  off  frequency  is  optimised  for  the  envelope 
frequency.  As  the  envelope  or  fundamental  frequency  varies,  so  the  loop  error 
voltage  varies  controlling  the  tracking  filter  and  thus  the  filter  tracks  the 
fundamental  frequency.  Using  the  tracking  filter  response  plot  of  cut  off 
frequency  versus  control  voltage  shown  in  Fig  14  and  the  plot  of  loop  error  voltage 
versus  VCO  frequency  for  the  PLL  shown  in  Fig  17,  a  suitable  transfer  function 
may  be  developed  to  use  the  loop  error  voltage  to  control  the  cut  off  frequency 
of  the  filter  and  set  optimum  cut  off  frequencies  for  the  fundamental  frequency 
being  tracked.  By  plotting  the  filter  control  voltage  required  to  extract 
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particular  fundamental  frequencies,  versus  the  loop  error  voltage  and  VCO 
frequency,  the  transfer  function  is  obtained  and  is  found  to  be  a  simple  linear 
relationship  (Fig  18).  This  transfer  function  is  implemented  using  a  single 
operational  amplifier  with  the  gain  set  to  the  value  of  the  reciprocal  of  the 
slope  of  the  line  and  an  ’offset'  voltage  applied  to  the  non-inverting  terminal 
equal  to  the  constant  in  the  equation 


mVc  *  C 


VC  '  M(Vo"C> 

where  Vq  *  loop  error  voltage  (output  from  operational  amplifier,  see  Fig  16) 

Vc  *  filter  control  voltage 

m  *  slope 

C  =*  constant. 

From  Fig  18  it  can  be  seen  that  the  required  values  of  m  and  C  are 
-0.66  and  5  volts  respectively. 

The  loop  error  voltage  (V  )  is  smoothed  using  an  RC  low  pass  filter  of 
time  constant  0. 1  second  and  buffered  by  a  voltage  follower  before  being  passed 
to  the  scaling  circuit  (Fig  19a)  which  achieves  the  transfer  function  described 
previously. 

5.2  Vo iced /unvoiced  decision  circuitry 

5 . 2 . 1  Full  wave  rectifier 

The  fundamental  frequency  derived  from  the  tracking  low  pass  filter  output 
described  in  section  4,1.4  is  amplified  by  a  factor  of  10  and  then  full  wave 
rectified  using  a  similar  circuit  to  that  described  in  section  4.1.2.  This 
signal  is  then  low  pass  filtered  in  order  to  obtain  its  envelope. 

5.2.2  Low  pas3  filter 

A  second  order  filter  with  a  Butterworth  response  and  a  cut  off  frequency 
of  45  Hz  is  used  to  derive  the  envelope  of  the  full  wave  rectifier  fundamental 
frequency  (Fig  19b).  The  general  equation  for  the  transfer  function  of  a  second 
order  low  pass  filter  is  shown  in  equation  (1). 

For  an  active  filter  Aq  is  the  gain  of  the  operational  amplifier  (Ref  46, 
pp  54-59) 
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(Fig  19b) 
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If  for  simplicity  of  design  Rj  =  *  R  and  *  C  then  reduces  to 


0) 


0 


1 

RC 


and  the  expression  for  b  reduces  to; 

b  =  3-K  *  3-Aq 

For  a  Butterworth  response  the  value  of  b  in  the  Butterworth  polynominal 
may  be  found  in  standard  filter  design  tables  and  for  a  second  order  filter  is 
1.4142.  Using  this  value  the  gain  of  the  amplifier  may  be  obtained 


where 


A0 


3  -  b  as  1.6 


also 


A 


0 


hence 


"b 


R 

a 


a 


0.6 


the  values  for  and  R^  in  this  case  chosen  to  be  5.6  kft  and  3.3  kft 

respectively. 

The  envelope  of  the  fundamental  frequency  thus  derived  is  then  fed  to  a 
comparator  which  compares  this  low  frequency  varying  voltage  with  another  preset 
voltage. 


23 


5.2.3  Voltage  comparator 

The  circuit  consists  basically  of  a  biassed  crossing  detector  (Fig  20) 
formed  around  a  single  operational  amplifier  whose  output  voltage  goes  negative 
when  the  input  voltage  slightly  exceeds  the  reference  voltage  of  opposite  polar- 

ii 

ity.  When  this  criterion  is  not  met  the  output  voltage  is  a  fraction  of  a  volt. 
With  no  input  signal  to  the  comparator  but  with  a  negative  reference  current 
ife£  flowing  from  the  non-inverting  terminal  of  the  amplifier  due  to  the  refer¬ 
ence  voltage  Vre£  >  the  diode  D1  conducts  when  the  current  exceeds  a  few  micro¬ 
amperes  and  an  output  voltage  of  0.5  volt  to  0.6  volt  is  obtained.  When  an  input 
current  i^n  fractionally  greater  and  of  opposite  sign  to  the  reference  current 
present,  the  output  voltage  swings  negative  to  the  value  of  the  zener  voltage 
thus  keeping  the  sum  of  the  currents  at  the  inverting  terminal  zero. 

By  adjusting  the  reference  voltage,  the  threshold  at  which  the  comparator 
switches  to  its  negative  output  state  may  be  altered.  Adjustment  of  the  reference 
voltage  allows  for  a  threshold  to  be  set  just  above  the  envelope  voltage  of  any 
spurious  signals  derived  from  the  tracking  filter  due  for  example  to  noise.  The 
reference  voltage  is  set  by  a  ten  turn  potentiometer  which,  with  a  multiturn 
dial,  allows  precise  setting  of  the  threshold  and  the  required  settings  for 
particular  speech  material  to  be  noted. 

The  output  from  the  comparator,  which  indicates  the  presence  of  voicing  in 
the  speech  signal,  is  used  to  gate  the  output  from  a  frequency  to  voltage  converter 
used  to  convert  the  fundamental  frequency,  derived  in  this  case  from  the  VCO  of 
the  phase  locked  loop,  to  a  voltage  level. 

5.2.4  Frequency  to  voltage  converter 

For  display  purposes  and  to  provide  a  suitable  signal  for  feeding  into  an 
analogue  to  digital  converter,  the  fundamental  frequency  is  converted  into  a 
voltage  level  by  a  frequency  to  voltage  converter  shown  in  Fig  21. 

The  square  wave  output  from  the  VCO  of  the  phase  locked  loop  is  used  for 
convenience  in  the  frequency  to  voltage  conversion  and  is  fed  to  the  base  of 
transistor  1  (Fig  2!)  via  a  capacitor  Cj.  Transistors  TrI  and  Tr2  comprise  a 
monos table  oscillator  and  on  the  positive  going  edge  of  an  input  square  wave  a 
negative  going  pulse  is  obtained  on  the  collector  of  TrI  and  a  positive  going 
pulse  on  the  collector  of  Tr2.  The  negative  going  pulse  is  differentiated  by  a 
capacitor  and  fed  to  the  base  of  Tr4  which  is  supplied  with  a  constant  current 
from  Tr3,  Tr3  and  Tr4  together  form  a  ramp  generator.  A  positive  spike  is 
produced  at  the  base  of  Tr4  by  C,  on  the  positive  edge  of  the  negative  pulse 
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causing  to  discharge.  Before  discharges  however,  the  leading  edge  of  the 
positive  pulse  derived  from  the  monostable  oscillator  switches  on  the  transistor 
Tr5  allowing  to  charge  to  the  value  on  C^.  Cj  is  buffered  by  an  LM308  opera¬ 
tional  amplifier  which  has  an  input  impedance  of  40  MQ  thus  minimising  leakage, 
the  voltage  on  C5  then  appears  at  the  output  of  ttie  amplifier  which  is  connected 
for  unity  gain.  The  output  of  the  LM308  is  fed  to  the  source  terminal  of  a  field 
effect  transistor  (FET)  controlled  by  the  comparator  output  voltage,  when  the 
comparator  output  is  negative  ( ie  a  voiced  decision  i3  made)  the  FET  has  a  high 
resistance  and  the  LM308  output  is  available  at  the  source  terminal  of  the  FET. 

When  the  comparator  is  in  the  off  state  (ie  an  unvoiced  decision  is  made)  the 
output  from  the  LM308  is  grounded  by  the  FET. 

6  CONSTRUCTIONAL  DETAILS 

The  fundamental  frequency  tracking  circuitry  is  assembled  on  one  114  *  203mm 
circuit  board  and  the  voiced/unvoiced  circuitry  on  another.  The  two  circuit  boards 
are  of  the  'plug  in'  type  (43  way)  and  are  incorporated  into  a  19  inch  rack 
mounting  system  with  2  inch  circuit  board  fronts  on  which  are  mounted  BNC  input 
and  output  sockets  and  a  ten  turn  precision  dial  controlling  the  voiced/unvoiced 
decision  threshold  (Fig  24) . 

A  -15  volt  power  supply  (ITT  Powercard  PC250A)  mounted  in  a  4  inch  wide 
rack  mounting  module  is  fitted  into  the  19  inch  rack  and  provides  power  for  the 
two  circuit  boards. 

7  DISCUSSION 

7. I  Of  all  the  acoustic  features  of  the  speech  signal  tha  fundamental  frequency 
appears  to  be  potentially  the  best  candidate  for  revealing  extra  linguistic 
changes  and  hence  giving  some  indication  of  the  physiological  and  psychological 
state  of  a  person.  For  this  reason  the  fundamental  frequency  tracker  described 
in  this  Report  has  been  made  to  enable  extra  linguistic  changes  to  be  quantified. 
The  realisation  of  the  fundamental  frequency  tracker  with  real  tine  operation 
avoids  the  use  of  a  minicomputer  to  implement  one  of  the  algorithms  described  in 
section  3  which  would  in  all  probability  not  run  in  real  time  on  a  general  purpose 
minicomputer.  However  a  minicomputer  can  be  used,  with  the  fundamental  frequency 
tracker  interfaced  to  it,  to  perform  statistical  analysis  on  the  fundamental 
frequency  data  derived  from  speech  samples.  The  tracker  will  be  interfaced  to  a 
PDP11/34  minicomputer  system  and  analysis  of  pilots  and  air  traffic  controllers 
speech  will  be  carried  out.  By  taking  around  20  seconds  of  voiced  speech  as 
a ample?  such  parameters  as  mean  fundamental  frequency,  variance  and  standard 
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deviation  will  ba  calculated  and  soma  fora  of  classification  statistics  applied 
to  investigate  whether  significant  separation  in  classes  can  be  obtained  for 
speech  samples  taken  under  different  levels  of  'strain'.  If  such  separation  can 
be  obtained  then  it  may  be  possible  to  correlate  changes  in  fundamental  frequency 
statistics  with  workload  level. 

The  accuracy  of  the  fundamental  frequency  tracker  has  yet  to  be  checked  but 
there  are  a  number  of  ways  in  which  this  could  be  done  and  it  is  hoped  that  a 
future  memorandum  will  report  on  the  accuracy  found  using  the  various  methods. 
There  are  three  basic  methods  which  could  ba  used  and  these  will  be  described 
briefly. 

7 . 2  Subjective  tests 

The  human  ear  and  sound  perception  mechanism  is  very  sensitive  and  may  be 
used  to  assess  the  accuracy  of  a  fundamental  frequency  tracker  by  outputing  the 
frequency,  preferably  modified  as  a  square  wave  or  sinusoidal  wave  with  some 
harmonics,  in  synchronism  with  the  speech  from  which  it  is  derived.  Discon¬ 
tinuities  in  the  tracked  fundamental  frequency  will  then  be  noticed  as  the 
extracted  ’pitch'  is  heard  alongside  the  actual  voice  'pitch'  of  the  speech. 

7.3  Comparison  with  laryngograph  measurements 

The  laryngograph  gives  an  accurate  record  of  the  points  of  opening  and 
closure  of  the  vocal  cords  by  an  impedance  measurement  method  across  the  larynx 
in  the  region  of  the  vocal  cords.  Such  equipment  has  been  pioneered  by 
University  College  London  and  arrangements  have  been  made  to  make  a  recording  of 
speech  and  laryngograph  measurements  simultaneously  so  that  the  same  speech  can 
be  processed  by  the  fundamental  frequency  tracker  and  the  data  derived  from  it 
compared  with  the  excitation  period  data  derived  from  the  larynogograph 
measurements. 


7.4  Comparison  with  mathematical  algorithms 

Implementation  of  some  of  the  algorithms  described  in  section  3  for  speech 
processing  and  particularly  cepstrum  analysis  will  allow  comparisons  to  be  made 
between  fundamental  frequency  derived  from  these  methods  and  the  pitch  tracker 
described  in  this  Report, 


3  comusiotis 
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A  tow  cost  device  for  tracking  the  fundamental  frequency  of  speech  in  real 
Ms  been  achieved,  if  in  particularly  suited  to  proceeding  speech  embedded 
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in  noise  and  will  allow  extensive  statistical  analysis  of  fundamental  frequency 
data  derived  from  pilots  and  air  traffic  controllers  speech  to  be  carried  out  to 
attempt  to  establish  correlation  between  changes  in  fundamental  frequency  statis¬ 
tics  and  'strain'  in  a  subject,  which  may  be  caused  by  workload.  Other  investig¬ 
ations  may  also  be  carried  out  into  the  effects  of  vibration,  noise  and  acceler¬ 
ation  on  the  fundamental  frequency  of  speech. 
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